Biosynthesis of Largimycins in Streptomyces argillaceus Involves Transient β-Alkylation and Cryptic Halogenation Steps Unprecedented in the Leinamycin Family

Largimycins A1 and A2 are key members of a recently identified family of hybrid nonribosomal peptide polyketides belonging to the scarcely represented group of antitumor leinamycins. They are encoded by the gene cluster lrg of Streptomyces argillaceus. This cluster contains a halogenase gene and two sets of genes for the biosynthesis and incorporation of β branches at C3 and C9. Noticeably, largimycins A1 and A2 are nonhalogenated compounds and only contain a β branch at C3. By generating mutants in those genes and characterizing chemically their accumulated compounds, we could confirm the existence of a chlorination step at C19, the introduction of an acetyl-derived olefinic exomethylene group at C9, and a propionyl-derived β branch at C3 in the biosynthesis pathway. Since the olefinic exomethylene group and the chlorine atom are absent in the final products, those biosynthetic steps can be considered cryptic in the overall pathway but essential to generating keto and epoxide functionalities at C9 and C18/C19, respectively. We propose that chlorination at C19 is utilized as an activation strategy that creates the precursor halohydrin to finally yield the epoxy functionality at C18/C19. This represents a novel strategy to create such functionalities and extends the small number of natural product biosynthetic pathways that include a cryptic chlorination step.


S3
LRGs, based on biosynthetic and phylogenetic arguments, 12 as already described for the first LRGs discovered. 11 The absolute configuration at the hydroxylated position 3 was not determined.
Largimycin K2 (LRG K2, 2) was assigned the molecular formula C21H27ClN2O5 based on the observed ion [M+H] + at m/z 423.1686 (calcd. for C21H28ClN2O5 + = 423.1681, m = 1.2 ppm) alongside their corresponding isotopic pattern, indicating 9 degrees of unsaturation. The structure of 2 was determined by detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparison with the NMR data of LRG K1 (1). Interpretation of the spectra revealed identical NMR features as those observed in 1 but the lack of one aliphatic methylene and one hydroxylated methine group, perfectly accounting for the formal "C2H4O" difference between both molecular formulas. Key COSY and long-range HMBC correlations ( Figure S14) rendered a structure almost identical to 1 lacking the two terminal carbons, C1 and C2 and turning C3 into a free carboxylic acid group. Stereochemistry of the double bonds was determined as described above for 1.
Largimycin K3 (LRG K3, 3) was assigned the molecular formula C22H31ClN2O4 based on the observed ion [M+H] + at m/z 423.2047 (calcd. for C22H32ClN2O4 + = 423.2045, m = 0.5 ppm) alongside their corresponding isotopic pattern, indicating 8 degrees of unsaturation. The structure of 2 was determined by detailed 1D ( 1 H) and 2D NMR (COSY, HSQC and HMBC) spectroscopic analyses further assisted by comparison with the NMR data of LRG K1 (1). Although the NMR sample contained significant impurities, careful interpretation of the spectra revealed identical NMR features as those observed in 1 but the lack of the free carboxylic acid group and one aliphatic methylene concomitant with the presence of a new methyl as first carbon atom of the chain. Such differences perfectly accounted for the formal "CO2" difference between both molecular formulae. Key COSY and longrange HMBC correlations ( Figure S14) rendered a structure corresponding formally to a decarboxylated form of 1. Stereochemistry of the double bonds was assigned based on the measured coupling constants and chemical shift comparisons with 1 since the NOESY spectrum was not acquired. The absolute configuration at the hydroxylated position 3 was not determined but must be identical to that in 1 based on their common biosynthetic origin.
Interpretation of the spectra revealed identical NMR features as those observed in 3 but the lack of one hydroxylated methine group concomitant with the presence of a new ketone group. Such differences perfectly accounted for the formal "H2" difference between the molecular formulas of 4 and S4 3, suggesting 4 as an identical compound to 3 but with a higher oxidation state at C3 (ketone vs. alcohol). Key COSY and long-range HMBC correlations ( Figure S14) confirmed the expected connectivity. Stereochemistry of the double bonds was determined as described above for 1.
Largimycin M1 (LRG M1, 5) was assigned the molecular formula C16H17ClN2O5 based on the observed ion [M+H] + at m/z 353.0896 (calcd. for C16H18ClN2O5 + = 353.0899 m = 0.8 ppm) alongside their corresponding monochlorinated isotopic pattern, indicating 9 degrees of unsaturation. The structure of 5 was determined by detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparison with the NMR data of already reported LRGs 11 and LRG K1 (1). Interpretation of the HSQC and HMBC spectra revealed the presence of 6 quaternary carbons (including one ester/amide carbonyl at C 164.4 and five sp 2 carbons in the range C 99-164), 8 methines (including six olefinic/aromatic carbons, one oxygenated methine and one methine likely corresponding to the CH of an amino acid moiety), one olefinic methylene, one aliphatic methylene and a methyl group (as an olefinic double bond substituent). Analysis of COSY correlations identified different spin systems which could be connected using the key long-range correlations observed in the HMBC spectrum ( Figure S15). To meet the determined molecular formula, the chlorine atom was again easily assigned as a C19 substituent based on the observed chemical shifts for that methylene, rendering a formal -chlorinated threonine residue. The spin system comprising H10 to H13, contains four olefinic protons corresponding to two E double bonds, as indicated by the measured coupling constants and key NOEs ( Figure S15). This spin system is conjugated on the H13 end with the aromatic oxazole heterocycle characteristic of LRGs, as revealed by the key HMBC correlations between H15 and C13, C14 and C16. On the H10 end, the mentioned spin system is conjugated to an -pyrone moiety, as revealed by the key HMBC correlation between H13 and C8, C9, between H8 and C9,C7, C6, and between H21 and C5, C6, C7. Such -pyrone motif accounts for the difference observed in the UV spectrum of 5 compared to that of known LRGs an the previously described LRGs K1, K2, K3 and K4. Once again, the absolute configuration at C17 and C18 has been assigned to be the same as those found in L-Thr, based on the same arguments as indicated for the previous LRGs.
Largimycin M2 (LRG M2, 6) was assigned the molecular formula C16H17ClN2O5 based on the observed ion [M+H] + at m/z 353.0897 (calcd. for C16H18ClN2O5 + = 353.0899 m = 0.6 ppm) alongside their corresponding isotopic pattern, indicating 9 degrees of unsaturation. The molecular formula is identical to that of LRG M1 (5). The structure of 6 was determined by detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparison with the NMR data of LRG M1 (5) and LRG K1 (1). Interpretation of the spectra revealed identical NMR features as those observed in 5 but with important differences in the chemical shifts of the olefinic protons. Key COSY and long-range HMBC correlations ( Figure S15) confirmed identical connectivity to that LRG S5 M1 (5) but differing in the stereochemistry of the  12 double bond, being now Z (as opposed to the E stereochemistry found in 5), as revealed by the coupling constant of 11.1 Hz between H12 and H13, thus following the trend of all LRG described before but LRG M1.
Largimycin M3 (LRG M3, 7) was assigned the molecular formula C18H19ClN2O6 based on the observed ion [M+H] + at m/z 395.1002 (calcd. for C18H20ClN2O6 + = 395.1004 m = 0.5 ppm) alongside their corresponding isotopic pattern, indicating 10 degrees of unsaturation. The structure of 7 was determined by detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparison with the NMR data of LRG M1 (5). Interpretation of the spectra revealed identical NMR features as those observed in 5 with the additional presence of an N-acetyl group (in the amino group of the Thr residue) which perfectly accounts for the formal "C2H2O" difference between the molecular formulae of 7 and 5. Key COSY and long-range HMBC correlations ( Figure S15) confirmed the structure of LRG M3 (5) as an N-acetylated version of LRG M1 (5).
Stereochemistry of the double bonds was determined as described above for 5.
Largimycin H1 (LRG H1, 8) was assigned the molecular formula C23H26N2O9S based on the observed ion [M+NH4] + at m/z 524.1700 (calcd. for C23H30N3O9S + = 524.1697, m = 0.6 ppm) alongside its corresponding isotopic pattern, indicating 12 degrees of unsaturation. The structure of 8 was established after detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparisons with the NMR data of LRG A4 and LRG A1. 11 The NMR spectroscopic data of 8 (Table S9 and Figure S12) were remarkably similar to those of LRG A4 with an important difference, the epoxide resonances corresponding to positions 18 and 19 in LRG A4 are absent in 8 which rather displays one methyl and one hydroxylated methine signals at those positions (corresponding to a formal Thr residue). The pattern of COSY and HMBC correlations of 8 ( Figure   S16) corroborated the almost identical connectivity of LRG H1 and LRG A4. The oxime double bond was assigned a Z stereochemistry based the observed C 68.2 for C18 and comparison with the empirical chemical shift prediction obtained for the two possible E/Z configurations of the oxime double bond, as it was described for the first LRGs discovered. 11 Not surprisingly, the NOESY correlations observed for 8 ( Figure S16) also match those found in LRG A1 and A4 1, indicating the expected identical relative configuration. The absolute configuration of the chiral centers at C3 and C18 was assigned to be the same as for the previously reported LRGs, based on their common biosynthetic origin and the phylogenetic arguments already described for the first LRGs discovered. 11 Largimycin H2 (LRG H2, 9) was assigned the molecular formula C28H33N3O11S2 based on the observed ion [M+NH4] + at m/z 669.1894 (calcd. for C28H34N3O11S2 + = 669.1895, m = 0.1 ppm) alongside its corresponding isotopic pattern, indicating 14 degrees of unsaturation. The extra sulfur atom in the molecular formula, compared to 8 suggested the presence of a CysNAc moiety as found S6 in LRG A1 and LRG A2. The structure of 9 was established after detailed 1D ( 1 H) and 2D NMR (COSY, NOESY, HSQC and HMBC) spectroscopic analyses further assisted by comparisons with the NMR data of LRG A2 11 and LRG H1. The NMR spectroscopic data of 9 (Table S10 and Figure S13) were remarkably similar to those of LRG A2 with again the same difference as between LRG H1 and LRG A4: the epoxide resonances corresponding to positions 18 and 19 in LRG A2 are absent in 9 which rather displays one methyl and one hydroxylated methine signals at those positions (corresponding to a formal Thr residue), as found in LRG H1. The pattern of COSY and HMBC correlations of 8 ( Figure   S16) corroborated the almost identical connectivity of LRG H2 and LRG A2. The oxime double bond was assigned a Z stereochemistry based the observed C 67.7 for C18 and comparison with the empirical chemical shift prediction obtained for the two possible E/Z configurations of the oxime double bond, as indicated before for LRG H1 and it was described for the first LRGs discovered. 11 Not surprisingly, the NOESY correlations observed for 9 ( Figure S15) also match those found in LRG A2, indicating the expected identical relative configuration. The absolute configuration of the chiral centers at C3 and C18 was assigned to be the same as for LRG H1, based on their common biosynthetic origin, and the absolute configuration of position 2' of the S-conjugated CysNAc unit was assigned to be S, since this moiety derives from mycothiol, as described for the first LRGs discovered. 11 Figure

S. argillaceus ΔlrgKLM1
Mut.orf39-41c up Mut.orf39-41c rp  Figure Figure S16. Key COSY correlations (bold bonds) and 1 H to 13 C HMBC correlations (blue arrows) determining the connectivity of LRG H1 (8) and H2 (9). Key NOEs (red arrows) employed alongside 3 JHH and chemical shift comparisons to determine the Z/E stereochemistry of the double bonds and the relative configuration of the chiral centers. Absolute configuration at positions 17 and 18 matches that of L-Thr based on the biosynthetic pathway of LRGs. Absolute configuration at position 3 is based on common biosynthetic origin as already reported for known LRGs. Energy-minimized molecular models of both compounds are shown indicating the measured distances (in Å) related to the observed key NOEs.